Skip to content

cosmos: handle HTTP 403/sub-status 5300 (AAD_REQUEST_NOT_AUTHORIZED) by refreshing bearer token and retrying#46167

Open
Copilot wants to merge 15 commits intomainfrom
copilot/fix-azure-cosmos-403-error
Open

cosmos: handle HTTP 403/sub-status 5300 (AAD_REQUEST_NOT_AUTHORIZED) by refreshing bearer token and retrying#46167
Copilot wants to merge 15 commits intomainfrom
copilot/fix-azure-cosmos-403-error

Conversation

Copy link
Copy Markdown
Contributor

Copilot AI commented Apr 6, 2026

  • Analyze issue: AsyncCosmosBearerTokenCredentialPolicy does not handle HTTP 403 with sub-status 5300 (AAD_REQUEST_NOT_AUTHORIZED) - only 401 is handled
  • Add send() override to AsyncCosmosBearerTokenCredentialPolicy in _auth_policy_async.py to clear cached token and retry on 403/5300
  • Add send() override to sync CosmosBearerTokenCredentialPolicy in _auth_policy.py for the same fix
  • Rewrite tests using realistic Pipeline/AsyncPipeline with MockTransport that returns proper requests.Response objects with headers
  • Tests verify Authorization header format (type=aad&ver=1.0&sig=<token>) in both initial and retry requests
  • Fix spelling: "retriable" → "retryable" in test docstrings
  • Update CHANGELOG.md with bug fix entry referencing PR 46167
  • Fix TypeError in test_aad_credentials: store retry flag in request.context (dict) instead of request.context.options (forwarded to transport as kwargs)
  • 14 tests passing (7 sync + 7 async)

bambriz and others added 3 commits April 6, 2026 14:16
…and async)

When Cosmos DB returns HTTP 403 with sub-status 5300 (AAD_REQUEST_NOT_AUTHORIZED),
the cached bearer token is now cleared and the request is retried with a fresh token.
This mirrors how the base class handles HTTP 401, and resolves the issue where
long-running services using managed identity would permanently fail after token expiry.

- Added send() override to CosmosBearerTokenCredentialPolicy (_auth_policy.py)
- Added send() override to AsyncCosmosBearerTokenCredentialPolicy (_auth_policy_async.py)
- Added unit tests for both sync and async policies

Agent-Logs-Url: https://github.com/Azure/azure-sdk-for-python/sessions/a5381531-6292-4e5e-be43-586d3267d980

Co-authored-by: bambriz <8497145+bambriz@users.noreply.github.com>
Copilot AI changed the title [WIP] Fix azure.cosmos client recovery from 403 on expired AAD token cosmos: handle HTTP 403/sub-status 5300 (AAD_REQUEST_NOT_AUTHORIZED) by refreshing bearer token and retrying Apr 6, 2026
Copilot AI requested a review from bambriz April 6, 2026 21:24
bambriz and others added 2 commits April 6, 2026 16:02
Replace MagicMock-based policy isolation tests with full azure-core
Pipeline/AsyncPipeline tests that use a real MockTransport (extending
HttpTransport/AsyncHttpTransport). This approach:

- Returns proper requests.Response objects with real header dicts,
  including x-ms-substatus for the 403/5300 trigger
- Records all outgoing requests so tests can assert on the exact
  Authorization header value (type=aad&ver=1.0&sig=<token>)
- Verifies both the Cosmos AAD header format and token freshness
  end-to-end through the full policy pipeline, matching the pattern
  used in azure-core's own authentication tests

Agent-Logs-Url: https://github.com/Azure/azure-sdk-for-python/sessions/a67aa68c-a3b6-42ef-824f-aa45415226d6

Co-authored-by: bambriz <8497145+bambriz@users.noreply.github.com>
Copy link
Copy Markdown
Contributor Author

Copilot AI commented Apr 6, 2026

Just as a heads up, I was blocked by some firewall rules while working on your feedback. Expand below for details.

Warning

Firewall rules blocked me from connecting to one or more addresses (expand for details)

I tried to connect to the following addresses, but was blocked by firewall rules:

  • scanning-api.github.com
    • Triggering command: /home/REDACTED/work/_temp/ghcca-node/node/bin/node /home/REDACTED/work/_temp/ghcca-node/node/bin/node --enable-source-maps /home/REDACTED/work/_temp/copilot-developer-action-main/dist/index.js (dns block)

If you need me to access, download, or install something from one of these locations, you can either:

Copilot AI requested a review from bambriz April 6, 2026 23:10
Copy link
Copy Markdown
Contributor Author

Copilot AI commented Apr 7, 2026

Just as a heads up, I was blocked by some firewall rules while working on your feedback. Expand below for details.

Warning

Firewall rules blocked me from connecting to one or more addresses (expand for details)

I tried to connect to the following addresses, but was blocked by firewall rules:

  • scanning-api.github.com
    • Triggering command: /home/REDACTED/work/_temp/ghcca-node/node/bin/node /home/REDACTED/work/_temp/ghcca-node/node/bin/node --enable-source-maps /home/REDACTED/work/_temp/copilot-developer-action-main/dist/index.js (dns block)

If you need me to access, download, or install something from one of these locations, you can either:

Copilot AI requested a review from bambriz April 7, 2026 04:43
@bambriz
Copy link
Copy Markdown
Member

bambriz commented Apr 7, 2026

@copilot update changelog to mention the pr and the fix applied.

Copy link
Copy Markdown
Contributor Author

Copilot AI commented Apr 7, 2026

@copilot update changelog to mention the pr and the fix applied.

Done in commit f2daec1e — added a bug fix entry to the 4.16.0b3 (Unreleased) section of CHANGELOG.md referencing PR 46167.

Warning

Firewall rules blocked me from connecting to one or more addresses (expand for details)

I tried to connect to the following addresses, but was blocked by firewall rules:

  • scanning-api.github.com
    • Triggering command: /home/REDACTED/work/_temp/ghcca-node/node/bin/node /home/REDACTED/work/_temp/ghcca-node/node/bin/node --enable-source-maps /home/REDACTED/work/_temp/copilot-developer-action-main/dist/index.js (dns block)

If you need me to access, download, or install something from one of these locations, you can either:

@simorenoh simorenoh marked this pull request as ready for review April 9, 2026 19:04
@simorenoh simorenoh requested a review from a team as a code owner April 9, 2026 19:04
Copilot AI review requested due to automatic review settings April 9, 2026 19:04
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates the Cosmos AAD bearer token auth policies to recover from a Cosmos-specific auth failure pattern (HTTP 403 with sub-status 5300 / AAD_REQUEST_NOT_AUTHORIZED) by clearing the cached token and retrying once, and adds unit coverage for both sync and async pipelines.

Changes:

  • Add 403/5300 detection in both sync and async Cosmos bearer token credential policies to refresh token and retry once.
  • Add new sync/async unit tests using Pipeline/AsyncPipeline with mock transports to validate retry behavior and Authorization header formatting.
  • Add a changelog entry documenting the bug fix.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
sdk/cosmos/azure-cosmos/azure/cosmos/_auth_policy.py Adds sync send() override to retry once on 403 + substatus 5300 after clearing cached token.
sdk/cosmos/azure-cosmos/azure/cosmos/aio/_auth_policy_async.py Adds async send() override with the same 403/5300 retry behavior.
sdk/cosmos/azure-cosmos/tests/test_auth_policy_unit.py New sync unit tests verifying retry and Cosmos AAD Authorization header format.
sdk/cosmos/azure-cosmos/tests/test_auth_policy_unit_async.py New async unit tests verifying retry and Cosmos AAD Authorization header format.
sdk/cosmos/azure-cosmos/CHANGELOG.md Documents the 403/5300 retry fix under Bugs Fixed.

simorenoh and others added 2 commits April 10, 2026 14:43
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
:return: The pipeline response object
:rtype: ~azure.core.pipeline.PipelineResponse
"""
retry_key = "_cosmos_aad_403_retried"
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@copilot What is the point of this retry_key? It is causing a TypeError: Session.request() got an unexpected keyword argument '_cosmos_aad_403_retried'
in test_aad_credentials

Causing tests to fail.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in commit 36cd3987. The retry_key was being stored in request.context.options, which azure-core forwards as keyword arguments to the underlying Session.request() call — causing the TypeError. The flag is now stored directly in request.context (the PipelineContext dict), which is designed for pipeline-level metadata and is never forwarded to the transport.

Copy link
Copy Markdown
Contributor Author

Copilot AI commented Apr 10, 2026

Just as a heads up, I was blocked by some firewall rules while working on your feedback. Expand below for details.

Warning

Firewall rules blocked me from connecting to one or more addresses (expand for details)

I tried to connect to the following addresses, but was blocked by firewall rules:

  • pypi.org
    • Triggering command: /home/REDACTED/work/azure-sdk-for-python/azure-sdk-for-python/.venv/bin/python /home/REDACTED/work/azure-sdk-for-python/azure-sdk-for-python/.venv/bin/python /home/REDACTED/work/azure-sdk-for-python/azure-sdk-for-python/.venv/lib/python3.9/site-packages/pip/__pip-REDACTED__.py install --ignore-installed --no-user --prefix /tmp/pip-build-env-qqyjlx6_/overlay --no-warn-script-location --no-binary :none: --only-binary :none: -i REDACTED -- setuptools&gt;=40.8.0 (dns block)
  • scanning-api.github.com
    • Triggering command: /home/REDACTED/work/_temp/ghcca-node/node/bin/node /home/REDACTED/work/_temp/ghcca-node/node/bin/node --enable-source-maps /home/REDACTED/work/_temp/copilot-developer-action-main/dist/index.js (dns block)

If you need me to access, download, or install something from one of these locations, you can either:

Copilot AI requested a review from bambriz April 10, 2026 20:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

azure-cosmos: async client does not recover from 403 on expired AAD token (bearer token policy only handles 401)

4 participants